Add ability to speficy GPUs by UUID prefix#923
Conversation
|
Hey, I've left this in draft because I'm a little unsure on how to proceed given the lack of real-world reproduction to verify against. I also have a smaller question on whether I should add the documentation change to this change or do it in a followup. |
|
Taking out of draft for visibility and hopefully feedback: I still have some questions from #923 (comment) |
|
@benoit-cty since you were kind enough to review my previous PR, could you advise me on how to proceed? |
|
Hello @cianc , thanks for your contribution. We are a project maintained only by volunteer so you have to expect some delay for the review. |
…s to track.. This is to address mlco2#873 Prior to this change you could only pass an index into the number of GPUs on the system. Now you can pass a UUID prefix, including the 'MIG-' prefix per https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-environment-variables if desired. Note that I have not been able to test this on real life repo. The reporter of mlco2#873 was not able to provide a repro.
8f7f5f7 to
87f41ab
Compare
|
Docs updated. |
|
Nice thanks! unfortunately the See https://github.com/mlco2/codecarbon/blob/master/CONTRIBUTING.md#build-documentation-%EF%B8%8F for mor info. |
…he matching html file.
|
Whoops, that was embarrassing! Fixed. |
Description
Add the ability to pass UUID prefixes as a way of specifying what GPUs to track.
This is to address #873
Prior to this change you could only pass an index into the number of GPUs on the system.
Now you can pass a UUID prefix, including the 'MIG-' prefix per
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-environment-variables
if desired.
This change likely requires a documentation change to
https://mlco2.github.io/codecarbon/parameters.html. I am planning to do that in a follow up
change, but can add it here if you'd like.
Note that I have not been able to test this on a real life reproduction. The reporter of
#873 was not able to provide one.
Related Issue
Please link to the issue this PR resolves: #873
Motivation and Context
Per the above issue, we currently fail on parsing passed GPU ids when they are UUID prefixes.
This is especially a problem when the
CUDA_VISIBLE_DEVICESvariable is automatically setin some cases (I think this is what is happening in the linked issue on huggingface).
How Has This Been Tested?
Added new unit tests, but as noted above I have not been able to reproduce the orginal
huggingface issue to verify.
Screenshots (if appropriate):
Types of changes
What types of changes does your code introduce? Put an
xin all the boxes that apply:Checklist:
Go over all the following points, and put an
xin all the boxes that apply.